Human modeling and relighting are two fundamental problems in computer vision and graphics, where high-quality datasets can largely facilitate related research. However, most existing human datasets only provide multi-view human images captured under the same illumination. Although valuable for modeling tasks, they are not readily used in relighting problems. To promote research in both fields, in this paper, we present UltraStage, a new 3D human dataset that contains more than 2K high-quality human assets captured under both multi-view and multi-illumination settings. Specifically, for each example, we provide 32 surrounding views illuminated with one white light and two gradient illuminations. In addition to regular multi-view images, gradient illuminations help recover detailed surface normal and spatially-varying material maps, enabling various relighting applications. Inspired by recent advances in neural representation, we further interpret each example into a neural human asset which allows novel view synthesis under arbitrary lighting conditions. We show our neural human assets can achieve extremely high capture performance and are capable of representing fine details such as facial wrinkles and cloth folds. We also validate UltraStage in single image relighting tasks, training neural networks with virtual relighted data from neural assets and demonstrating realistic rendering improvements over prior arts. UltraStage will be publicly available to the community to stimulate significant future developments in various human modeling and rendering tasks.
translated by 谷歌翻译
Humans constantly interact with objects in daily life tasks. Capturing such processes and subsequently conducting visual inferences from a fixed viewpoint suffers from occlusions, shape and texture ambiguities, motions, etc. To mitigate the problem, it is essential to build a training dataset that captures free-viewpoint interactions. We construct a dense multi-view dome to acquire a complex human object interaction dataset, named HODome, that consists of $\sim$75M frames on 10 subjects interacting with 23 objects. To process the HODome dataset, we develop NeuralDome, a layer-wise neural processing pipeline tailored for multi-view video inputs to conduct accurate tracking, geometry reconstruction and free-view rendering, for both human subjects and objects. Extensive experiments on the HODome dataset demonstrate the effectiveness of NeuralDome on a variety of inference, modeling, and rendering tasks. Both the dataset and the NeuralDome tools will be disseminated to the community for further development.
translated by 谷歌翻译
Neural network pruning has been a well-established compression technique to enable deep learning models on resource-constrained devices. The pruned model is usually specialized to meet specific hardware platforms and training tasks (defined as deployment scenarios). However, existing pruning approaches rely heavily on training data to trade off model size, efficiency, and accuracy, which becomes ineffective for federated learning (FL) over distributed and confidential datasets. Moreover, the memory- and compute-intensive pruning process of most existing approaches cannot be handled by most FL devices with resource limitations. In this paper, we develop FedTiny, a novel distributed pruning framework for FL, to obtain specialized tiny models for memory- and computing-constrained participating devices with confidential local data. To alleviate biased pruning due to unseen heterogeneous data over devices, FedTiny introduces an adaptive batch normalization (BN) selection module to adaptively obtain an initially pruned model to fit deployment scenarios. Besides, to further improve the initial pruning, FedTiny develops a lightweight progressive pruning module for local finer pruning under tight memory and computational budgets, where the pruning policy for each layer is gradually determined rather than evaluating the overall deep model structure. Extensive experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art baseline approaches, especially when compressing deep models to extremely sparse tiny models.
translated by 谷歌翻译
The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions. In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are ``semantically'' equivalent to the original state by adding meaningless moves to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for Go beginners. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that the actions taken by both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58\% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players and 90\% of examples indeed lead the Go agent to play an obviously inferior action. Our code is available at \url{https://PaperCode.cc/GoAttack}.
translated by 谷歌翻译
最近,我们看到了照片真实的人类建模和渲染的神经进展取得的巨大进展。但是,将它们集成到现有的下游应用程序中的现有网络管道中仍然具有挑战性。在本文中,我们提出了一种全面的神经方法,用于从密集的多视频视频中对人类表演进行高质量重建,压缩和渲染。我们的核心直觉是用一系列高效的神经技术桥接传统的动画网格工作流程。我们首先引入一个神经表面重建器,以在几分钟内进行高质量的表面产生。它与多分辨率哈希编码的截短签名距离场(TSDF)的隐式体积渲染相结合。我们进一步提出了一个混合神经跟踪器来生成动画网格,该网格将明确的非刚性跟踪与自我监督框架中的隐式动态变形结合在一起。前者将粗糙的翘曲返回到规范空间中,而后者隐含的一个隐含物进一步预测了使用4D哈希编码的位移,如我们的重建器中。然后,我们使用获得的动画网格讨论渲染方案,从动态纹理到各种带宽设置下的Lumigraph渲染。为了在质量和带宽之间取得复杂的平衡,我们通过首先渲染6个虚拟视图来涵盖表演者,然后进行闭塞感知的神经纹理融合,提出一个分层解决方案。我们证明了我们方法在各种平台上的各种基于网格的应用程序和照片真实的自由观看体验中的功效,即,通过移动AR插入虚拟人类的表演,或通过移动AR插入真实环境,或带有VR头戴式的人才表演。
translated by 谷歌翻译
现有的远处监督的关系提取器通常依靠嘈杂的数据进行模型培训和评估,这可能导致垃圾堆放系统。为了减轻问题,我们研究了小型清洁数据集是否可以帮助提高远距离监督模型的质量。我们表明,除了对模型进行更具说服力的评估外,一个小的清洁数据集还可以帮助我们构建更强大的Denoising模型。具体而言,我们提出了一个基于影响函数的清洁实例选择的新标准。它收集了样本级别的证据,以识别良好实例(这比损失级别的证据更具信息性)。我们还提出了一种教师实习机制,以控制自举套件时中间结果的纯度。整个方法是模型不合时宜的,并且在denoising Real(NYT)和合成噪声数据集上都表现出强烈的性能。
translated by 谷歌翻译
低光图像增强是某些复杂视觉任务的关键预处理任务。目标检测,图像分割和图像识别结果都受图像增强的影响直接影响。但是,当前使用的大多数图像增强技术不会产生令人满意的结果,并且这些增强的网络具有相对较弱的鲁棒性。我们建议使用U-NET作为其主要结构的改进网络,并将许多不同的注意机制作为解决问题的解决方案。在特定的应用程序中,我们将网络用作生成器和LSGAN作为培训框架,以获得更好的增强结果。我们证明了本文随后的实验中提出的网络Brightennet的有效性。它产生的结果既可以保留图像细节,又符合人类视觉标准。
translated by 谷歌翻译
在这项工作中,我们提出了叙述,这是一种新颖的管道,可以以逼真的方式同时编辑肖像照明和观点。作为一种混合神经形态的面部模型,叙述了几何学感知生成方法和正常辅助物理面部模型的互补益处。简而言之,叙述首先将输入肖像转变为粗糙的几何形状,并采用神经渲染来产生类似于输入的图像,并产生令人信服的姿势变化。但是,反演步骤引入了不匹配,带来了较少面部细节的低质量图像。因此,我们进一步估计了师范的肖像,以增强粗糙的几何形状,从而创建高保真的物理面部模型。特别是,我们融合了神经和身体渲染,以补偿不完善的反转,从而产生了现实和视图一致的新颖透视图像。在重新阶段,以前的作品着重于单一视图肖像重新审议,但也忽略了不同观点之间的一致性,引导不稳定和不一致的照明效果以进行视图变化。我们通过将其多视图输入正常地图与物理面部模型统一,以解决此问题。叙事通过一致的正常地图进行重新进行重新,施加了跨视图的约束并表现出稳定且连贯的照明效果。我们在实验上证明,叙述在先前的工作中取得了更现实的,可靠的结果。我们进一步使用动画和样式转移工具进行介绍,从而分别或组合姿势变化,灯光变化,面部动画和样式转移,所有这些都以摄影质量为单位。我们展示了生动的自由视图面部动画以及3D感知可靠的风格化,可帮助促进各种AR/VR应用程序,例如虚拟摄影,3D视频会议和后期制作。
translated by 谷歌翻译
尽管常规机器人系统中的每个不同任务都需要专用的场景表示形式,但本文表明,统一表示形式可以直接用于多个关键任务。我们提出了用于映射,进程和计划(LOG-GPIS-MOP)的log-gaussian过程隐式表面:基于统一表示形式的表面重建,本地化和导航的概率框架。我们的框架将对数转换应用于高斯过程隐式表面(GPIS)公式,以恢复全局表示,该表示可以准确地捕获具有梯度的欧几里得距离场,同时又是隐式表面。通过直接估计距离字段及其通过LOG-GPIS推断的梯度,提出的增量进程技术计算出传入帧的最佳比对,并在全球范围内融合以生成MAP。同时,基于优化的计划者使用相同的LOG-GPIS表面表示计算安全的无碰撞路径。我们根据最先进的方法验证了2D和3D和3D和基准测试的模拟和真实数据集的拟议框架。我们的实验表明,LOG-GPIS-MOP在顺序的音程,表面映射和避免障碍物中产生竞争结果。
translated by 谷歌翻译
人类对象相互作用(HOI)识别的关键是推断人与物体之间的关系。最近,该图像的人类对象相互作用(HOI)检测取得了重大进展。但是,仍然有改善视频HOI检测性能的空间。现有的一阶段方法使用精心设计的端到端网络来检测视频段并直接预测交互。它使网络的模型学习和进一步的优化更加复杂。本文介绍了空间解析和动态时间池(SPDTP)网络,该网络将整个视频作为时空图作为人类和对象节点作为输入。与现有方法不同,我们提出的网络通过显式空间解析预测交互式和非相互作用对之间的差异,然后执行交互识别。此外,我们提出了一个可学习且可区分的动态时间模块(DTM),以强调视频的关键帧并抑制冗余帧。此外,实验结果表明,SPDTP可以更多地关注主动的人类对象对和有效的密钥帧。总体而言,我们在CAD-1220数据集和某些ELSE数据集上实现了最先进的性能。
translated by 谷歌翻译